Penguin crossfilter

View a running version of this notebook. | Download this project.


Cross-filtering Palmer Penguins

In [1]:
import numpy as np
import pandas as pd

import holoviews as hv
import hvplot.pandas # noqa

hv.extension('bokeh')

In this introduction to building interactive dashboards we will primarily be using 4 libraries:

  1. Pandas: To load and manipulate the data
  2. hvPlot: To quickly generate plots using a simple and familiar API
  3. HoloViews: To link selections between plots easily
  4. Panel: To build a dashboard we can deploy

Building some plots

Let us first load the Palmer penguin dataset (Gorman et al.) which contains measurements about a number of penguin species:

In [2]:
penguins = pd.read_csv('penguins.csv')
penguins
Out[2]:
studyName Sample Number Species Region Island Stage Individual ID Clutch Completion Date Egg Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g) Sex Delta 15 N (o/oo) Delta 13 C (o/oo) Comments
0 PAL0708 1 Adelie Penguin Anvers Torgersen Adult, 1 Egg Stage N1A1 Yes 11/11/07 39.1 18.7 181.0 3750.0 MALE NaN NaN Not enough blood for isotopes.
1 PAL0708 2 Adelie Penguin Anvers Torgersen Adult, 1 Egg Stage N1A2 Yes 11/11/07 39.5 17.4 186.0 3800.0 FEMALE 8.94956 -24.69454 NaN
2 PAL0708 3 Adelie Penguin Anvers Torgersen Adult, 1 Egg Stage N2A1 Yes 11/16/07 40.3 18.0 195.0 3250.0 FEMALE 8.36821 -25.33302 NaN
3 PAL0708 5 Adelie Penguin Anvers Torgersen Adult, 1 Egg Stage N3A1 Yes 11/16/07 36.7 19.3 193.0 3450.0 FEMALE 8.76651 -25.32426 NaN
4 PAL0708 6 Adelie Penguin Anvers Torgersen Adult, 1 Egg Stage N3A2 Yes 11/16/07 39.3 20.6 190.0 3650.0 MALE 8.66496 -25.29805 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
328 PAL0910 119 Gentoo penguin Anvers Biscoe Adult, 1 Egg Stage N38A1 No 12/1/09 47.2 13.7 214.0 4925.0 FEMALE 7.99184 -26.20538 NaN
329 PAL0910 121 Gentoo penguin Anvers Biscoe Adult, 1 Egg Stage N39A1 Yes 11/22/09 46.8 14.3 215.0 4850.0 FEMALE 8.41151 -26.13832 NaN
330 PAL0910 122 Gentoo penguin Anvers Biscoe Adult, 1 Egg Stage N39A2 Yes 11/22/09 50.4 15.7 222.0 5750.0 MALE 8.30166 -26.04117 NaN
331 PAL0910 123 Gentoo penguin Anvers Biscoe Adult, 1 Egg Stage N43A1 Yes 11/22/09 45.2 14.8 212.0 5200.0 FEMALE 8.24246 -26.11969 NaN
332 PAL0910 124 Gentoo penguin Anvers Biscoe Adult, 1 Egg Stage N43A2 Yes 11/22/09 49.9 16.1 213.0 5400.0 MALE 8.36390 -26.15531 NaN

333 rows × 17 columns

This diagram provides some background about what these measurements mean:

Next we define an explicit colormapping for each species:

In [3]:
colors = {
    'Adelie Penguin': '#1f77b4',
    'Gentoo penguin': '#ff7f0e',
    'Chinstrap penguin': '#2ca02c'
}

Now we can start plotting the data with hvPlot, which provides a familiar API to pandas .plot users but generates interactive plots.

We start with a simple scatter plot of the culmen (think bill) length and depth for each species:

In [4]:
scatter = penguins.hvplot.scatter('Culmen Length (mm)', 'Culmen Depth (mm)', c='Species', cmap=colors, responsive=True, min_height=300)
scatter
Out[4]:

Next we generate a histogram of the body mass colored by species:

In [5]:
histogram = penguins.hvplot.hist('Body Mass (g)', by='Species', color=hv.dim('Species').categorize(colors), legend=False, alpha=0.5, responsive=True, min_height=300)
histogram
Out[5]:

Next we count the number of individuals of each species and generate a bar plot:

In [6]:
bars = penguins.hvplot.bar('Species', 'Individual ID', c='Species', cmap=colors, responsive=True, min_height=300).aggregate(function=np.count_nonzero)
bars
Out[6]:

Finally we generate violin plots of the flipper length of each species split by the sex:

In [7]:
violin = penguins.hvplot.violin('Flipper Length (mm)', by=['Species', 'Sex'], cmap='Category20', responsive=True, min_height=300).opts(split='Sex')

violin
Out[7]:

Linking the plots

hvPlot let us build interactive plots very quickly but what if we want to gain deeper insights about this data by selecting along one dimension and seeing that selection appear on other plots? Using HoloViews we can easily compose and link these plots:

In [8]:
ls = hv.link_selections.instance()

ls(scatter.opts(show_legend=False) + bars + histogram + violin).cols(2)
Out[8]:

Building a dashboard

As a final step we will compose these plots into a dashboard using Panel, so as a first step we will load the Palmer penguins logo:

In [9]:
import panel as pn

logo = pn.panel('logo.png', height=60)
logo
Out[9]:

Next we define use some functionality on the link_selections object to display the count of currently selected penguins:

In [10]:
def count(selected):
    return pn.pane.Markdown(f"## {len(selected)}/{len(penguins)} penguins selected", align='center')

pn.panel(pn.bind(count, ls.selection_param(penguins)))
Out[10]:

Now we will compose these two items into a Row which will serve as the header of our dashboard:

In [11]:
welcome = "## Welcome and meet the Palmer penguins!"

penguins_art = pn.panel('./lter_penguins.png', width=250)

credit = "### Artwork by @allison_horst"

instructions = """
Use the box-select and lasso-select tools to select a subset of penguins
and reveal more information about the selected subgroup through the power
of cross-filtering.
"""

license = """
### License

Data are available by CC-0 license in accordance with the Palmer Station LTER Data Policy and the LTER Data Access Policy for Type I data."
"""
art = pn.Column(welcome, penguins_art, credit, instructions, license, sizing_mode='stretch_width')
art
Out[11]:

Deploying the dashboard

In [12]:
pn.config.raw_css.append("body .bk-root { font-family: Ubuntu !important; }")

material = pn.template.MaterialTemplate(logo='logo.png', title='Palmer Penguins')

ls = hv.link_selections.instance()

header = pn.Row(
    pn.layout.HSpacer(),
    pn.bind(count, ls.selection_param(penguins)),
    pn.Spacer(width=100),
    sizing_mode='stretch_width'
)
header

selections = ls(scatter.opts(show_legend=False) + bars.opts(show_legend=False) + histogram + violin).cols(2)

material.header.append(header)
material.sidebar.append(art)
material.main.append(selections)

material.servable();
This web page was generated from a Jupyter notebook and not all interactivity will work on this website. Right click to download and run locally for full Python-backed interactivity.

View a running version of this notebook. | Download this project.